A data format enabling interoperation of speech recognition, translation and information extraction engines: the GALE type system

نویسندگان

  • John F. Pitrelli
  • Burn L. Lewis
  • Edward A. Epstein
  • Jerome L. Quinn
  • Ganesh N. Ramaswamy
چکیده

Live interoperation of several speechand text-processing engines is key to tasks such as real-time cross-language story segmentation, topic clustering, and captioning of video. One requirement for interoperation is a common data format shared across engines, so that the output of one can be understood as the input of another. The GALE Type System has been created to serve this purpose for interoperating language-identification, speaker-recognition, speech-recognition, named-entity-detection, translation, storysegmentation, topic-clustering, summarization, and headlinegeneration engines in the context of Unstructured Information Management Architecture. GTS includes types designed to bridge across the domains of these engines, for example, linking the text-only domain of translation to the time-domain types needed for speech processing, and the monolingual domain of information-extraction engines to the cross-language types needed for translation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Aggregating distributed STT, MT, and information extraction engines: the GALE interoperability-demo system

Natural-language-processing engines are now attaining accuracy sufficient to begin combining them to perform more complex tasks. The GALE Interoperability Demo system consists of 12 engines including speech recognition, translation, and various information extraction engines, interoperated to make Arabic news video browsable as English text grouped and summarized by topic. Unstructured Informat...

متن کامل

Off-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model

In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...

متن کامل

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...

متن کامل

Explicit and Implicit Requirements of Technology Evaluations: Implications for Test Data Creation

A multitude of approaches, methodologies and metrics exist for evaluating the performance of technologies like machine translation, speech recognition and information extraction. While metrics vary widely in their assumptions about what is being tested and how it should be measured, most technology evaluations rely crucially on a carefully constructed test data set that is both accurate and ful...

متن کامل

بهبود عملکرد سیستم بازشناسی گفتار پیوسته بوسیله ویژگی‌های استخراج شده از مانیفولدهای گفتاری در فضای بازسازی شده فاز

The design for new feature extraction methods out of the speech signal and combination of their obtained information is one of the most effective approaches to improve the performance of automatic speech recognition (ASR) system. Recent researches have been shown that the speech signal contains nonlinear and chaotic properties, but the effects of these properties are not used in the continuous ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008